Tip: This page is generated from a Jupyter notebook, some of the code are hid under the hood, some of them can be shown by clicking the button Show Code. If you want to visit the complete notebook, please click the view on github button above.

Introduction - (Motivation part)

Undoubtedly the recent appearrance and expansion of COVID-19 virus has affected the lives of billions of people worldwide in many aspects. Goverments have been under constant challenge to reduce social interaction in order to mitigate the possibilities of virus transmission. Therefore, they have introduced hard measurements to face this severe situation which have significant impact to every body's live.

Economy was on of the major areas that affected from those measurements. The work culture had to change to meet the derivative of the goverments, which led companies to move faster towards digitilisation. As a result companies that weren't eager in such changes to face important financial issues forcing them in many cases to reduce their human resources. For other companies such travelling agencies or copmanies in hospitalitty sector, the hit was even harder since they rely their profits entirely on the people's need for entertainment, social exploration etc.. Therefore, they have completely or partially shut down their operation leading many people in unemployment.

The above constitutes common observations and may look discouranging and demotivating facts for many people. However, we can not conclude how big this impact is in each country's overall economy without a more in depth investigation of actual facts.

Upon that, we came to the desicion to analyse data from macroeconomic point of view in order to get a more clear understanding of how the virus has affected our economy. We will start the study by presenting a statistical analysis of how the situation with regards to COVID-19 looks like in the countries countries around the globe and we will narrow the analysis to the most impacted ones (In the map above is illustrated the data about the confirmed cases worldwide). Then we will include financial data to explore whether there is a significant impact of the virus in the economy of those countries.

In order to carry out the analysis we will use data for COVID-19 from github gist which they are updated every day. In that way we will have overview daily on the situation about the expansion of the virus. The financial data derived from IMF, OECD and other sources which can be found at the end of the page. The reason we chose those datasets was that we believe they contain all the information needed to obtain the required outcome about the fincanial situation of the countries under consideration.

To sum up, from this study we aim to provide a conclusion about the economic consequences due to COVID-19. Through interactive and annotated graphs we want to give to the intendent audience all the information needed in order to understand the impact of COVID-19 in economy in a simple and concine manner.

Data analysis and visualization

First of all, we would like to show the current situation of COVID-19 by three maps that shows the confirmed cases, death cases and recovered cases in different countries.

Then we will start by introducing the data about COVID-19 and later on in the study will go deeper in the economic data.

# collapse-hide
# data preperation, combine refrence dataset to virus dataset to obtain area code for map plot
refrence = refrence.rename(columns={'Country_Region': 'Country/Region'})
most_recent_data = world_data[world_data['Date'] == world_data['Date'].max()]
most_recent_data = most_recent_data[['Date', 'Country/Region', 'Confirmed','Recovered','Deaths']]
grouped = most_recent_data.groupby('Country/Region').sum()

result = grouped.join(refrence.set_index('Combined_Key'), on='Country/Region')
result = result.fillna(value=0)
result['code3'] = result['code3'].astype(int)

# confirm map
confirmMap = alt.Chart(alt.topo_feature(data.world_110m.url, 'countries'), title='COVID-19 Confirm Overview').mark_geoshape(
    stroke='#aaa', strokeWidth=0.25
).transform_lookup(
    lookup='id', from_=alt.LookupData(data=result, key='code3', fields=['Confirmed'])
).encode(
    alt.Color('Confirmed:Q',
              scale=alt.Scale(domain=[0, result.Confirmed.max()/10], clamp=True), 
              legend=alt.Legend(format='')),
    tooltip = [alt.Tooltip('Confirmed:Q')]
).project(
    type='equirectangular'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)


# death map
deathMap = alt.Chart(alt.topo_feature(data.world_110m.url, 'countries'), title='COVID-19 Deaths Overview').mark_geoshape(
    stroke='#aaa', strokeWidth=0.25
).transform_lookup(
    lookup='id', from_=alt.LookupData(data=result, key='code3', fields=['Deaths'])
).encode(
    alt.Color('Deaths:Q',
              scale=alt.Scale(domain=[0, result.Deaths.max()/10], clamp=True), 
              legend=alt.Legend(format='')),
    tooltip = [alt.Tooltip('Deaths:Q')]
).project(
    type='equirectangular'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)

# recover map
recoverMap = alt.Chart(alt.topo_feature(data.world_110m.url, 'countries'), title='COVID-19 Recovered Overview').mark_geoshape(
    stroke='#aaa', strokeWidth=0.25
).transform_lookup(
    lookup='id', from_=alt.LookupData(data=result, key='code3', fields=['Recovered'])
).encode(
    alt.Color('Recovered:Q',
              scale=alt.Scale(domain=[0, result.Recovered.max()/10], clamp=True), 
              legend=alt.Legend(format='')),
    tooltip = [alt.Tooltip('Recovered:Q')]
).project(
    type='equirectangular'
).properties(
    width=900,
    height=500
).configure_view(
    stroke=None
)

confirmMap
deathMap
recoverMap

COVID-19 analysis

In this section, we will dive more into COVID-19 data to present the current situation of virus by illustrating the numbers of confirmed, recovered and death cases. Then with help of interactive represenation of those numbers we will try to understand the spread rate and distribution of COVID-19.

In the following table is shown a sample of the data regarding COVID-19. The dataset contains columns with the countries, confirmed and recovered cases as well as overall deaths per country.

Date Country Confirmed Recovered Deaths
20565 2020-05-10 West Bank and Gaza 375 263 2
20566 2020-05-10 Western Sahara 6 5 0
20567 2020-05-10 Yemen 51 1 8
20568 2020-05-10 Zambia 267 117 7
20569 2020-05-10 Zimbabwe 36 9 4

Exploration analysis

In this section we will perfrom a basic statistical analysis of the data in order to identify how the data are distibuted among the columns and to detect any important patterns that might be usefull in the further on analysis.

First, we will start by illustating the descriptive statistics of our dataset. In this way we can summarize the central tendency, dispersion and shape of our dataset's distribution.

In the table below it can be observed the great differences in the max values among the cases. The standard deviation is quite high in all the presented cases which means that our data is spread out. We can se also that the mean are very different as well. The overall average deaths across the countries is significantly lower that the confirmed and recovered cases. Meaning that the average death rate of the virus is 6.65 % and the average recovery rate is 28.35%. However, there are countries that have been impacted more than others and thus these rates are not equally distibuted among them.

Confirmed Recovered Deaths
count 2.057000e+04 20570.000000 20570.000000
mean 5.388404e+03 1540.071415 357.036364
std 4.220631e+04 10052.542768 2922.326614
min 0.000000e+00 0.000000 0.000000
25% 0.000000e+00 0.000000 0.000000
50% 9.000000e+00 0.000000 0.000000
75% 3.870000e+02 40.000000 7.000000
max 1.329260e+06 216169.000000 79526.000000

The figure below illustrates the confirmed cases per country, in total 187 countries are shown. We have set a threshold of 100000 confirmed cases (red line in the graph). Countries with more than 100000 cases are shown in red bars. Countries with confirmed cases between 10000-100000 are shown in orange bars while the rest with cases below 10000 cases are shown in blue.

It can clearly observed that the threshold of 100000 cases has been exceeded by Brazil, Italy, Spain, Iran, UK, USA, France, Germany, Russia and Turkey. Four of those countries (Spain, USA,Italy, UK) have cross the threshold of 200000 incidents, while the cases in USA have reached the extreme record of 1300000 cases. The above mentioned countries gather the 72.37% (by the day the report was written) of total confirmed cases worldwide. The 22,87% of the cases gathered in countries with confirmed cases in between 10000 and 100000 while the remaining 4.76% is reocrded from the rest of the countries.

Another interesting observation is that Italy, USA, Germany, France and United Kingdom (countries that have been hit hardly by COVID-19) are among the seven largest IMF- advanced economies in the world. Meaning that potential impact in their economy due to virus could directly affect the global economy.

#collapse-hide

# mutiple color support, key is the plot color, value is the confirmed cases range
colorDict = {
    'blue': (0, 10000),
    'orange': (10001, 100000),
    'red': (100001, 100000000)
}

def addColorType(df, colorDict):
    # assign default color
    df['Color'] = 3
    for key, val in colorDict.items():
        df.loc[(df['Confirmed'] > val[0]) & (df['Confirmed'] <= val[1]), ['Color']] = key

        
Threshold = pd.DataFrame({'Threshold':[100000]})
# continuous coloring
domain = [10000, 100000, 100000000]
range_ = ['blue', 'red', 'green']
#summary of all the countries
# get the last day's data, The conifrmed cases is accumulated, so the last day's data includes all confirmed cases so far
plotData = full_clean_data.loc[full_clean_data.Date == full_clean_data.Date.max()]
addColorType(plotData, colorDict)
summary = alt.Chart(plotData).mark_bar().encode(
    x=alt.X('Country:O',sort='-y'),
    y=alt.Y("Confirmed:Q"),
    tooltip = [alt.Tooltip('Country'),
               alt.Tooltip('Confirmed')],
    # The highlight will be set on the result of a conditional statement
#     color=alt.Color('Confirmed', scale=alt.Scale(domain=domain, range=range_))
    color=alt.Color('Color', legend=None)
).properties(width=3000,height=400)

rule = alt.Chart(Threshold).mark_rule(color='red').encode(
    y=alt.Y('Threshold:Q'),tooltip = [alt.Tooltip('Threshold')]
               
)

(summary+rule)
72.18347811480072
22.886296629762448
17.784364841586225
26.986685403625316

In the figures belows is illustrated the maximum values of the cases for the corresponding (that have exceeded the threshold of 100000 confirmed cases) countries in order to identify which countries have recorded the highest numbers of confirmed, recovered and death incidents due to COVID-19. From a first sight we can observe that the cases are not proportional with each other. For example the countries with the most confirmed cases they don't necessarily record the most deaths or recovered cases.

Also we observe significant fluctuations on how the cases are distibuted among the countries. For instance, there is no consistency on how the cases increased in each country. This of course makes sense as the number of deaths, recovered etc. highly depends on factors such the health care system of each country, the population age and so on. Factors that are beyond the scope of this study.

# collapse-hide

group = full_clean_data.groupby('Country')['Deaths','Confirmed','Recovered'].max().sort_values(by=['Deaths','Confirmed','Recovered'])
group = pd.DataFrame(group)
group = group.reset_index()
# keep only the countries with more than 10000 deaths
new_group = group.query("Confirmed >= 100000")
countries = list(new_group.Country.unique())

#define colors
red = alt.value('#f54242')
green = alt.value('#137E2A')
black = alt.value('#050404')

#presenting the confirmed cases per country
bars = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Confirmed:Q',
    y=alt.Y("Country:O", sort='-x'),color = red
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Confirmed:Q',color =black
)

bars2 = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Recovered:Q',
    y=alt.Y("Country:O", sort='-x'),color=green
)


text2 = bars2.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Recovered:Q',color=black
)

bars3 = alt.Chart(new_group).mark_bar(size=5).encode(
    x='Deaths:Q',
    y=alt.Y("Country:O", sort='-x'),color=black
)


text3 = bars3.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='Deaths:Q',color=black
)



laydermap = (bars + text).properties(width= 250,height=300)|(bars2+text2).properties(width= 250,height=300)|(bars3+text3).properties(width=250,height=300)
laydermap.configure_axis(grid=False).configure_view(strokeWidth=0)

A calculation about the overall deaths and recovered cases yields that the 10 countries under consideration accumulate the 79.95% and 66.51% of the deaths and recovered cases respectively. While countries which record cases between 10 and 100 thousand gather the 17.78% and 26.98% of the deaths and recovered cases accordingly. The rest 2.26% of deaths and 6.51% of recovered cases is aggregated in countries with less than 10 thousand incidents.

It can clearly be concluded that the countries with more total incidents have the more total deaths as well. Something that isn't valid when we focus on countries individually as we saw above. We can clearly see that even though many countries have been hit by COVID-19 solely 10 have affected significantly while the rest have suffered by a moderate to low impact in terms of deaths.

The 10 countries explain the 66.51846889585445 % of the overall recovered cases
The 10 countries explain the 79.95790814070706 % of the overall death cases

Data analysis of the major countries

Following the findings from the preliminary exploration analysis we focus on the 10 countries that have been affected the most by COVID-19 virus.

In order to extract more information as possible from the dataset it is necessary to combine several datasets. By doing so, we include columns referring to daily new cases, new deaths and new recovered cases. This new data refer to how much the corresponding cases changed compared to the day before.

Other, than that an investigation for missing values and treatment of those it is also a requirement to bring the dataset in form ready for analysis. In the present study the missing values were filled with zeros. It considered the best way to treat such a values because if for example the missing values were filled with the mean, mode or median could lead to false interpration of the results.

In the following tables it is shown first a sample of the final dataset about COVID-19 after the preprossesing and following up with a table contains the descriptive stastics of the dataset.

# collapse-show
# data processing to create Active, New cases, New deaths, New recovered
full_clean_data['Active'] = full_clean_data['Confirmed'] - full_clean_data['Recovered'] - full_clean_data['Deaths']

selected_data = full_clean_data[full_clean_data['Country'].isin(countries)]

for i in selected_data.index:
    date = selected_data.loc[i, 'Date']
    country = selected_data.loc[i, 'Country']
    date = datetime.strptime(date, '%Y-%m-%d')
    yesterday = datetime.strftime(date - timedelta(1), '%Y-%m-%d')
    yesterdayData = selected_data.loc[(selected_data.Date == yesterday) & (selected_data.Country == country)]
    if len(yesterdayData) <= 0:
        selected_data.loc[i, 'New cases'] = 0
        selected_data.loc[i, 'New deaths'] = 0
        selected_data.loc[i, 'New recovered'] = 0
        continue
    yesterdayData = yesterdayData.iloc[0]
    selected_data.loc[i, 'New cases'] = selected_data.loc[i, 'Confirmed'] - yesterdayData.Confirmed
    selected_data.loc[i, 'New deaths'] = selected_data.loc[i, 'Deaths'] - yesterdayData.Deaths
    selected_data.loc[i, 'New recovered'] = selected_data.loc[i, 'Recovered'] - yesterdayData.Recovered

selected_data = selected_data.fillna(value=0)
selected_data['New cases'] = selected_data['New cases'].astype(int)
selected_data['New deaths'] = selected_data['New deaths'].astype(int)
selected_data['New recovered'] = selected_data['New recovered'].astype(int)
Date Country Confirmed Recovered Deaths Active New cases New deaths New recovered
20522 2020-05-10 Russia 209688 34306 1915 173467 11012 88 2390
20540 2020-05-10 Spain 224350 136166 26621 61563 772 143 2214
20555 2020-05-10 Turkey 138657 92691 3786 42180 1542 47 3211
20556 2020-05-10 US 1329260 216169 79526 1033565 19710 731 3635
20560 2020-05-10 United Kingdom 220449 1002 31930 187517 3924 268 1

In the table below we can see the basic statistics for the 10 unders study countries. By looking the new data added seems to be less spread out than the already existed ones. Additionally, we can see that the average new deaths per day are about 204 while the new confirmed (new cases) and recovered record an average of 2713 and 848 new cases per day respectively. We can also see that the maximum values of the new data are quite high for every day cases.

Confirmed Recovered Deaths Active New cases New deaths New recovered
count 1.094000e+03 1094.000000 1094.000000 1.094000e+03 1094.000000 1094.000000 1094.000000
mean 7.141185e+04 17477.282450 5321.648080 4.861292e+04 2713.258684 204.750457 848.500914
std 1.669993e+05 34893.817369 11381.750527 1.332066e+05 5781.380024 414.039433 1908.785605
min 0.000000e+00 0.000000 0.000000 0.000000e+00 0.000000 0.000000 0.000000
25% 5.000000e+00 0.000000 0.000000 3.000000e+00 0.000000 0.000000 0.000000
50% 2.953500e+03 47.500000 54.500000 2.662000e+03 517.000000 11.000000 0.000000
75% 9.255600e+04 16762.750000 4472.250000 5.112450e+04 3067.000000 187.750000 1200.500000
max 1.329260e+06 216169.000000 79526.000000 1.033565e+06 36188.000000 2612.000000 33227.000000

Below it is illustrated how the daily new cases are spread out across time (Note: By creating a rectangular with the mouse in the upper graph you can see the cases over time, while in plot underneath is shown cumulated new cases.).

It is observed that Italy it was the first country that appeared increasing COVID-19 incidents following by France, Germany and Spain while the rest of the countries are following shortly after. Overall from 15th of February - 15th of March the was present in all the countries. It seems that in Russia and Brazil the number of daily cases is following an increasing fashion. In Italy, Spain, Germany and less in Turkey the virus appears to record a decreasing trend. The same in France however, in some days seems that the cases show an increase and then start decreasing again. Significant, fluctuation considering that one day 3000 cases recorded while the next merely 750 (look between 29th of April and 1st of May).

In the UK the virus has remained steady in high levels since the 5th of April, while in the USA we observe that the daily cases showed a rapid increase in first days and has remained in significant high levels ever since. Both in the UK and USA the daily cases don't seem to decrease soon.

Also this steady condition seems that it has lasted more than the rest of the countries that suffered first from the virus (e.g. Italy, Spain,France, Germany).

# collapse-hide
# plot
interval = alt.selection_interval()

circle = alt.Chart(selected_data, title='Spread and New Cases Over Time').transform_filter(
    alt.datum.Country != 'Iran').mark_circle().encode(
    x='monthdate(Date):O',
    y='Country',
    color=alt.condition(interval, 'Country', alt.value('lightgray')),
    size=alt.Size('New cases:Q',
        scale=alt.Scale(range=[0, 3000]),
        legend=alt.Legend(title='Daily new cases')
    ) 
).properties(
    width=1000,
    height=400,
    selection=interval
)

bars = alt.Chart(selected_data).mark_bar().encode(
    y='Country',
    color='Country',
    x='sum(New cases):Q'
).properties(
    width=1000
).transform_filter(
    interval
)

circle & bars

In the graphs below is illustrated the average daily cases across the countries. We can obsereve, that the recovered cases are significantly higher from the deaths apart from the UK which is the opposite. As discussed previously in the study the countries with the more confirmed cases they don't necessarily show the most deaths and/or recovered cases.

# collapse-hide
group2 = selected_data.groupby('Country')['New deaths','New cases','New recovered'].mean().sort_values(by=['New deaths','New cases','New recovered'])
group2 = pd.DataFrame(group2.round())
group2 = group2.reset_index()
# # keep only the countries with more than 100000 confirmed
new_group2 = group2

#define colors
red = alt.value('#f54242')
green = alt.value('#137E2A')
black = alt.value('#050404')

#presenting the confirmed cases per country
bars = alt.Chart(new_group2).mark_bar(size=5).encode(
    x='New cases:Q',
    y=alt.Y("Country:O", sort='-x'),color = red
)

text = bars.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='New cases:Q',color =black
)

bars2 = alt.Chart(new_group2).mark_bar(size=5).encode(
    x='New recovered:Q',
    y=alt.Y("Country:O", sort='-x'),color=green
)


text2 = bars2.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='New recovered:Q',color=black
)

bars3 = alt.Chart(new_group2).mark_bar(size=5).encode(
    x='New deaths:Q',
    y=alt.Y("Country:O", sort='-x'),color=black
)


text3 = bars3.mark_text(
    align='left',
    baseline='middle',
    dx=3  # Nudges text to right so it doesn't appear on top of the bar
).encode(
    text='New deaths:Q',color=black
)



laydermap = (bars + text).properties(width= 250,height=300)|(bars2+text2).properties(width= 250,height=300)|(bars3+text3).properties(width=250,height=300)
laydermap.configure_axis(grid=False).configure_view(strokeWidth=0)

This disrepancy among the cases across the countries means that the countries have different death and recovery rates as well as infection rate. In the graph below the aforementioned rates are shown for each country individually for the total amount of each case.

#collapse-hide
#data preprocessing
#death rate
selected_data['DeathRate'] = selected_data['Deaths']/selected_data['Confirmed'] * 100
selected_data = selected_data.fillna(value=0)

#recovery rate
selected_data['RecoveryRate'] = selected_data['Recovered']/selected_data['Confirmed']*100
selected_data = selected_data.fillna(value=0)



#infection rate
population = {'Brazil':212559417,
              'Germany':82002000,
              'Russia':144005000,
              'Turkey':82000000,
             'France':65273511,
             'Italy':60461826,
             'Spain':46754775,
             'US':331002651,
             'United Kingdom':67886011,
             'Iran':83992949}


for i in selected_data['Country']:
    for key,value in population.items():
        if i == key:
            selected_data['InfectionRate'] = selected_data['Confirmed']/value * 100

# A dropdown filter
countries = list(selected_data.Country.unique())
country_dropdown = alt.binding_select(options=countries)
country_select = alt.selection_single(fields=['Country'], bind=country_dropdown, name="Select",init={'Country': 'US'})


#plot  infection rate
filter_infectionrates = alt.Chart(selected_data, width=300, height=300, title='Infection Rate').mark_line().encode(
    alt.X('Date:T'),
    alt.Y('InfectionRate:Q', title= 'Infection Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('InfectionRate:Q')]
).add_selection(country_select).transform_filter(country_select)



# plot death rate
filter_deathrate = alt.Chart(selected_data, width=300, height=300, title='Death Rate').mark_line().encode(
    alt.X('Date:T'),
    alt.Y('DeathRate:Q', title= 'Death Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('DeathRate:Q')]
).add_selection(country_select).transform_filter(country_select)

# plot recovery rate
filter_recovery = alt.Chart(selected_data, width=300, height=300, title='Recovery Rate').mark_line().encode(
    alt.X('Date:T'),
    alt.Y('RecoveryRate:Q', title= 'Recovery Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('RecoveryRate:Q')]
).add_selection(country_select).transform_filter(country_select)
            
(filter_infectionrates | filter_deathrate | filter_recovery) 

Exploring the initially the infection rates for each country it can be observed it has recorded an increasing trend since the firday the virus appeared for all the countries. More precicely by the day the report was written all the countries apart from the USA recond infection rates in range between 0.16%-0.30%. Relatively low percentages compare their populations which ranges between 40 million (Spain) to 200 million people(Brazil). On the other hand in the USA the infection rate records an increment of 2%. Significantly higher compare to the rest of the countries.

Moving forward to death rates, an increasing trend is still observed however more fluctuations are recorded. For example, in case of Iran there is a spike around 20-23 of February where a death rate of 100% is recorded. This is quite unsual, although the confirmed incidents at that time were quite few thus, this fluctuation might correspond to 1 or 2 deaths out of 1 or 2 incidents. Same anomalies observed also in case of France where about the same period of time a high increase in death rate is recorded followed by a rapid decrease until it started to increase again in a more steady trend. Similarly for the USA. The highest death rates are recorded in France, Italy and Spain with 15%,14% and 12% respectively.

At last, the recovery rates varies a lot among the countries. The highest recovery rate of 82% is recorded in Germany while the lower in the United Kingdom which is almost 0%. As in the death rates few anomalies observed in the first days of the virus where some countries record recovery rates of 100%. In Russia this seemed that it lasted for almost a month while in the same period of time recorded 0% death rate. This, doesn't seem very rational since after the 9th of March the results followed a revearsed course for Russia. So, it could diagnosis that recorded falsesly as COVID-19 recovers.

#collapse-hide
base = alt.Chart(selected_data).mark_bar().encode(
    x='monthdate(Date):O',
).properties(width=500)

base.encode(y='Confirmed',color='Country').properties(title = 'Total confirmed') | base.encode(y='Deaths',color = 'Country').properties(title='Total deaths') | base.encode(y='Recovered',color = 'Country').properties(title='Total recovered')

Now, we would like to illustrate how Covid-19 has been distributed among the analysed countries. In the first graph plot is illustrated the relation between confirmed and death cases from the day the first diagnosed case and up to now. By scrolling the slide bar under the plot it can be oserved the increase on deaths per day. It is very interesting how many more deaths compare to other countries have been recorded in the USA in only 60 days (by the time the report was written).

# collapse-hide
# data processing
start_date = datetime.strptime('2020-01-22', '%Y-%m-%d')

for index, row in selected_data.iterrows():
    date = datetime.strptime(row['Date'], '%Y-%m-%d')
    selected_data.loc[index, 'Day'] = (date - start_date).days
    
selected_data['Day'] = selected_data['Day'].astype(int)
# plot
select_date = alt.selection_single(
    name='select', fields=['Day'], init={'Day': 0},
    bind=alt.binding_range(min=0, max=selected_data.Day.max(), step=1)
)
alt.Chart(selected_data, title='COVID-19 Spread Over Time').mark_point(filled=True).encode(
    alt.X('Confirmed', scale=alt.Scale(zero=False)),
    alt.Y('Deaths', scale=alt.Scale(zero=False)),
    alt.Size('Active'),
    alt.Color('Country'),
    alt.Order('Confirmed', sort='descending'),
    tooltip = [alt.Tooltip('Country'),
               alt.Tooltip('Confirmed'),
               alt.Tooltip('Deaths'),
               alt.Tooltip('Active')
              ],
).properties(
    width=700,
    height=400
).add_selection(select_date).transform_filter(select_date)

Macroeconomic

should we show only for Denmark or globally

In this section we will attempt to perform an economic analysis from a macroeconimic point of view and in relation to the COVID-19 analysis above, we will try to come up with the potential coclusions on how the spread of the virus has affected the global economy. A closer look to Denmark will be given in this section as well. take a look on that again.

Macroeconomics is a branch of economics that studies how an overall economy behaves (focuses on the large scale). More presicely, macroeconomics studies economy-wide phenomena such as inflation, price levels, rate of economic growth, national income, gross domestic product (GDP), and changes in unemployment (Investopedia).

Stock Market

for denmark update all shares and omx20, look again USA i dont know why the shares don't appear

Talk about the stock market

#collapse-hide
# preprocessing data
# France
stockCAC40['Symbol']='CAC 40'
CACbasic['Symbol'] = 'CAC Basic Materials'
CACconsumer['Symbol'] = 'CAC Consumer Goods'
CACservice['Symbol'] = 'CAC Consumer Services'
CACfinancial['Symbol'] = 'CAC Financials'
CACutilities['Symbol'] = 'CAC Industrials'
CACtech['Symbol'] = 'CAC Technology'
CAChealth['Symbol'] = 'CAC Health Care'
CACoil['Symbol'] = 'CAC Oil & Gas'
CACindustrial['Symbol'] = 'CAC Industrials'
cacall['Symbol'] = 'France All Shares'
stockFRA = pd.concat([stockCAC40,CACbasic,CACconsumer,CACservice,CACfinancial,CACutilities,CACtech,
                     CAChealth,CACoil,CACindustrial,cacall],sort = True)
stockFRA['Date'] = pd.to_datetime(stockFRA.Date)
stockFRA = stockFRA.sort_values(by=['Symbol','Date'])
stockFRA['Price'] = stockFRA['Price'].str.replace(',','')
stockFRA['Price'] = stockFRA['Price'].astype(float)

# Italy
stockMIB['Symbol']='MIB'
utilities['Symbol'] = 'FTSE Utilities'
Technology['Symbol'] = 'FTSE Technology'
O_G['Symbol'] = 'FTSE Oil & Gas'
Travel['Symbol'] = 'FTSE Travel & Leisure'
industrials['Symbol'] = 'FTSE Industrials'
financials['Symbol'] = 'FTSE Financials'
health['Symbol'] = 'FTSE Health Care'
chemicals['Symbol'] = 'FTSE Chemicals'
allsharesitalia['Symbol'] = 'Italy All Shares'
stockITA = pd.concat([stockMIB,utilities,Technology,O_G,Travel,
                     industrials,financials,health,chemicals,allsharesitalia],sort = True)
stockITA['Date'] = pd.to_datetime(stockITA.Date)
stockITA = stockITA.sort_values(by=['Symbol','Date'])
stockITA['Price'] = stockITA['Price'].str.replace(',','')
stockITA['Price'] = stockITA['Price'].astype(float)

# Spain
ibex['Symbol']='IBEX 35'
materials['Symbol'] = 'Basic Materials Industry and Construction'
consumer['Symbol'] = 'Consumer Goods'
service['Symbol'] = 'Services'
financial['Symbol'] = 'Financial Services & Real Estate'
petrol['Symbol'] = 'Petrol and Power'
technology['Symbol'] = 'Technology and Telecommunications'
spainall['Symbol'] = 'Spain All Shares'
health['Symbol'] = 'FTSE Health Care'
chemicals['Symbol'] = 'FTSE Chemicals'
allsharesitalia['Symbol'] = 'Italy All Shares'
stockSP = pd.concat([ibex,materials,consumer,service,financial,petrol,technology,spainall],sort = True)
stockSP['Date'] = pd.to_datetime(stockSP.Date)
stockSP = stockSP.sort_values(by=['Symbol','Date'])
stockSP['Price'] = stockSP['Price'].str.replace(',','')
stockSP['Price'] = stockSP['Price'].astype(float)

# UK
ftse100['Symbol']='FTSE 100'
auto['Symbol'] = 'Automobiles & Parts'
forestry['Symbol'] = 'Forestry & Paper'
metals['Symbol'] = 'Industrial Metals & Mining'
telecom['Symbol'] = 'Mobile Telecommunications'
realestate['Symbol'] = 'Real Estate'
#aerospace['Symbol'] = 'Aerospace & Defense'
beverage['Symbol'] = 'Beverages'
ukall['Symbol'] = 'United Kingdom All Shares'
chemicalsuk['Symbol'] = 'Chemicals'
construction['Symbol'] = 'Construction & Building Materials'
stockUK = pd.concat([ftse100,auto,forestry,metals,telecom,realestate,beverage,chemicalsuk,construction,ukall],sort = True)
stockUK['Date'] = pd.to_datetime(stockUK.Date)
stockUK = stockUK.sort_values(by=['Symbol','Date'])
stockUK['Price'] = stockUK['Price'].str.replace(',','')
stockUK['Price'] = stockUK['Price'].astype(float)

# Turky
bist['Symbol']='BIST 100'
basictu['Symbol'] = 'Metals & Mining'
chemtu['Symbol'] = 'Chem Petrol Plastic'
electu['Symbol'] = 'Electricity'
foodtu['Symbol'] = 'Food & Beverages'
industrialstu['Symbol'] = 'Industrials'
financialstu['Symbol'] = 'Financial'
ittu['Symbol'] = 'Information Technology'
tourtu['Symbol'] = 'Tourism'
stockTU = pd.concat([bist,basictu,chemtu,electu,foodtu,financialstu,industrialstu,ittu,
                    tourtu],sort = True)
stockTU['Date'] = pd.to_datetime(stockTU.Date)
stockTU = stockTU.sort_values(by=['Symbol','Date'])
stockTU['Price'] = stockTU['Price'].str.replace(',','')
stockTU['Price'] = stockTU['Price'].astype(float)

# USA
dow30['Symbol']='Dow 30'
SP['Symbol'] ='S&P 500'
nasdaq['Symbol'] ='NASDAQ'
#bious['Symbol'] = 'Biotechnology'
banksus['Symbol'] = 'Banks'
financialsus['Symbol'] = 'Financials'
#healthus['Symbol'] = 'Health Care'
industrialsus['Symbol'] = 'Industrials'
insuranceus['Symbol'] = 'Insurance'
#internetus['Symbol'] = 'Internet'
computersus['Symbol'] = 'Computers'
telecomus['Symbol'] = 'Telecommunications'
transportationus['Symbol'] = 'Transportation'

stockUS = pd.concat([dow30,SP, nasdaq,banksus,financialsus,industrialsus,
                     insuranceus,computersus,
                    telecomus,transportationus],sort = True)
stockUS['Date'] = pd.to_datetime(stockUS.Date)
stockUS = stockUS.sort_values(by=['Symbol','Date'])
stockUS['Price'] = stockUS['Price'].str.replace(',','')
stockUS['Price'] = stockUS['Price'].astype(float)

# Germany
dax['Symbol']='DAX'
autogr['Symbol'] = 'Automobile'
chemicalsgr['Symbol'] = 'Chemicals'
#electricitych['Symbol'] = 'Electricity'
constructiongr['Symbol'] = 'Construction'
banksgr['Symbol'] = 'Banks'
consumergr['Symbol'] = 'Consumer'
financialsgr['Symbol'] = 'Financial'
foodgr['Symbol'] = 'Food & Beverages'
industrialgr['Symbol'] = 'Industrial'
stockGR = pd.concat([dax,autogr,chemicalsgr,constructiongr,banksgr,consumergr,financialsgr,
                    foodgr,industrialgr],sort = True)
stockGR['Date'] = pd.to_datetime(stockGR.Date)
stockGR = stockGR.sort_values(by=['Symbol','Date'])
stockGR['Price'] = stockGR['Price'].str.replace(',','')
stockGR['Price'] = stockGR['Price'].astype(float)

# Russia
moex['Symbol']='MOEX'
miningru['Symbol'] = 'Metals & Mining'
chemicalsru['Symbol'] = 'Chemicals'
electricityru['Symbol'] = 'Electricity'
oilru['Symbol'] = 'Oil & Gas'
transportru['Symbol'] = 'Transport'
consumerru['Symbol'] = 'Consumer'
financialsru['Symbol'] = 'Financial'
teleru['Symbol'] = 'Telecommunication'
stockRU = pd.concat([moex,miningru,chemicalsru,electricityru,oilru,transportru,
                    consumerru,financialsru,teleru],sort = True)
stockRU['Date'] = pd.to_datetime(stockRU.Date)
stockRU = stockRU.sort_values(by=['Symbol','Date'])
stockRU['Price'] = stockRU['Price'].str.replace(',','')
stockRU['Price'] = stockRU['Price'].astype(float)

# Brazil
bovespa['Symbol']='Bovespa'
basicbr['Symbol'] = 'Basic Materials'
electricalbr['Symbol'] = 'Electricity'
#electricitych['Symbol'] = 'Electricity'
financialbr['Symbol'] = 'Industrial'
industrialbr['Symbol'] = 'Gas & Water'
consumptionbr['Symbol'] = 'Consumption'
healthbr['Symbol'] = 'Health Care'
realestatebr['Symbol'] = 'Real Estate Investment & Services'
stockBR = pd.concat([bovespa,basicbr,electricalbr,industrialbr,consumptionbr,financialbr,healthbr,
                    realestatebr],sort = True)
stockBR['Date'] = pd.to_datetime(stockBR.Date)
stockBR = stockBR.sort_values(by=['Symbol','Date'])
stockBR['Price'] = stockBR['Price'].str.replace(',','')
stockBR['Price'] = stockBR['Price'].astype(float)

# add country column
stockFRA['Country']='France'
stockITA['Country']='Italy'
stockSP['Country']='Spain'
stockUK['Country']='UK'
stockUS['Country']='United States'
stockBR['Country']='Brazil'
stockGR['Country']='Germany'
stockRU['Country']='Russia'
stockTU['Country']='Turkey'
stocks = pd.concat([stockFRA,stockITA,stockSP,stockUK,stockUS,
                   stockBR,stockGR,stockRU,stockTU],sort = True)

#dropdown
countries = list(stocks.Country.unique())
country_dropdown = alt.binding_select(options=countries)
country_select = alt.selection_single(fields=['Country'], bind=country_dropdown, name="Select", init={'Country': 'United States'})


line = alt.Chart(stocks, title='Major Index & Primary Sectors Stocks Price (Major Countries)').mark_line(interpolate='basis',size=5).encode(
    x = 'Date',
    y = 'Price',
    color='Symbol',
    strokeDash='Symbol',
    tooltip = [alt.Tooltip('Symbol:N'),
               alt.Tooltip('Price:Q')]
).properties(width=600, height=500).add_selection(country_select).transform_filter(country_select)

line

GDP Inflation & unemployment data

Major countrys' GDP Inflation and unemployment annual change rate data from IMF includes forecast of 2020 and 2021

I have removed Germany because we are not icluding it in the analysis above and now we need data if possible for the UK

# collapse-hide
# data preprocessing
def extract_data(df, subject):
    dates = ['2014', '2015', '2016', '2017', '2018', '2019', '2020', '2021']
    d = {'Date': dates, 'Value': [df[date] for date in dates]}
    values = []
    countries = []
    _dates = []
    for country in df.Country.unique():
        tmp = df.loc[df.Country == country]
        for date in dates:
            countries.append(country)
            _dates.append(date)
            values.append(float(tmp[date]))
    
    rv = pd.DataFrame.from_dict({'Date': _dates, 'Country': countries, 'Value': values})
    rv['subject'] = subject
    return rv

unemploy = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Unemployment rate']
unemploy = extract_data(unemploy, 'unemployment')
inflation = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Inflation, average consumer prices']
inflation = extract_data(inflation, 'inflation')
gdp = majorCountry.loc[majorCountry['Subject Descriptor'] == 'Gross domestic product, constant prices']
gdp = extract_data(gdp, 'gdp')

# A dropdown filter
countries = list(majorCountry.Country.unique())
country_dropdown = alt.binding_select(options=countries)
country_select = alt.selection_single(fields=['Country'], bind=country_dropdown, name="Select",init={'Country': 'United States'})

filter_gdp = alt.Chart(gdp, width=300, height=300, title='GDP Growth of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)

# umemployment plot
filter_unemployment = alt.Chart(unemploy, width=300, height=300, title='Unemployment Change of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)

# inflation plot
filter_inflation = alt.Chart(inflation, width=300, height=300, title='Inflation Change of Major Countries').mark_line(point=True).encode(
    alt.X('Date:T'),
    alt.Y('Value:Q', title= 'Growth Rate %'),
    color='Country',
    tooltip = [alt.Tooltip('Value:Q')]
).add_selection(country_select).transform_filter(country_select)


(filter_gdp | filter_unemployment | filter_inflation)

Import Export plot

This one is showing quarterly change(%) of import and export data for top 10 countries

could also change to monthly but that looks a bit of too long

#collapse-hide
# data preprocessing
trade = trade.replace({'Imports in goods (value)': 'Imports', 'Exports in goods (value)': 'Exports'})
# trade data doesn't include Spain
countries = ['Italy', 'United States', 'France', 'Germany', 'Turkey', 'United Kingdom', 'Russia', 'Brazil']
trade = trade.loc[trade.Country.isin(countries)]
quarterlyTrade = trade.loc[trade.Frequency == 'Quarterly']
monthlyTrade = trade.loc[trade.Frequency == 'Monthly']

# a dropdown
country_dropdown = alt.binding_select(options=countries)
realPercent = alt.binding_radio(options=['Percentage', 'US Dollar'])
country_select = alt.selection_single(name="Select",
                                      fields=['Country', 'Unit'], 
                                      bind={'Country': country_dropdown, 'Unit': realPercent},  
                                      init={'Country': 'United States', 'Unit': 'Percentage'})

alt.Chart(quarterlyTrade).mark_bar().encode(
    x='Subject:O',
    y=alt.Y('Value:Q', title='Change Percentage(%) or Billion'),
    color=alt.condition(
        alt.datum.Value > 0,
        alt.value("steelblue"),  # The positive color
        alt.value("orange")  # The negative color
    ),
    tooltip = [alt.Tooltip('Value:Q')],
    column=alt.Column('TIME:N', title='Date')
).add_selection(country_select).transform_filter(country_select)

GENRES

Visualization

Discussion

Contribution

References

  1. Investopedia